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Abstract 

A local graph partitioning algorithm finds a set of vertices with small conductance (i.e. a 
sparse cut) by adaptively exploring part of a large graph G, starting from a specified vertex. 
For the algorithm to be local, its complexity must be bounded in terms of the size of the set 
that it outputs, with at most a weak dependence on the number n of vertices in G. Previous 
local partitioning algorithms find sparse cuts using random walks and personalized PageRank. 
In this paper, we introduce a randomized local partitioning algorithm that finds a sparse cut by 
simulating the volume-biased evolving set process, which is a Markov chain on sets of vertices. We 
prove that for any set of vertices A that has conductance at most 0, for at least half of the starting 
vertices in A our algorithm will output (with probability at least half), a set of conductance 
0(<?!)i/2log^/^n). We prove that for a given run of the algorithm, the expected ratio between 
its computational complexity and the volume of the set that it outputs is 0{(j)~^^^ polylog(n)). 
In comparison, the best previous local partitioning algorithm, due to Andersen, Chung, and 
Lang, has the same approximation guarantee, but a larger ratio of 0{if)^^ polylog(n)) between 
the complexity and output volume. Using our local partitioning algorithm as a subroutine, we 
construct a fast algorithm for finding balanced cuts. Given a fixed value of 0, the resulting 
algorithm has complexity {m + ncf)^^/^)) • 0(polylog(n)) and returns a cut with conductance 
0((/)^/^ log^^^ n) and volume at least where is the largest volume of any set with 

conductance at most 0. 



1 Introduction 



A local graph partitioning algorithm solves a targeted version of the classic sparsest cut problem; it 
finds a set with small conductance by adaptively examining a small subset of the input graph near 
a specified starting vertex. Such algorithms are useful for finding target clusters in large graphs, 
and for quickly finding collections of small clusters. They have been applied in practice to probe 
the community structure of social and information networks [10, 2, 12], and have been used as 
subroutines to design fast algorithms for other partitioning problems [20, 22]. 

Spielman and Teng introduced a local partitioning algorithm with a remarkable approximation 
guarantee and bound on its computational complexity [20, 22]. Their algorithm has a bounded 
work/volume ratio, which is the ratio between the work performed by the algorithm on a given 
run (meaning the number of operations or computational complexity), and the volume of the set 
it outputs. It also has a local approximation guarantee, which states (roughly) that if the starting 
vertex is contained in a set with conductance at most (f), then the algorithm will output a set with 
conductance at most /(0). To find such a set, their algorithm computes a sequence of vectors 
that approximate the sequence of probability distributions of a random walk from the starting 
vertex. The support of these vectors is kept small by removing tiny amounts of probability mass 
at each step. The most recent version of their algorithm [22] has local approximation guarantee 
/((/)) = O(0^/^ log^/^ n) and work/volume ratio 0{(p~'^ polylog(n)). Andersen, Chung, and Lang [1] 
introduced a local partitioning algorithm that computes a single personalized PageRank vector 
rather than a sequence of random walk distributions. Their algorithm has approximation guarantee 
0(i^^/^ log^'^^ n) and work/volume ratio 0{(j)~^ polylog(n)). 

The evolving set process (ESP) is a Markov chain whose states arc subsets of the vertex set of 
a graph. Its transition rule is a simple procedure that grows or shrinks the current set. Morris 
and Peres used the ESP, and the closely related volume-biased evolving set process (volume-biased 
ESP), to bound the mixing time of Markov chains in terms of their isoperimetric properties [17]. 
The volume-biased ESP is equivalent to the strong stationary dual of a random walk, which was 
introduced earlier by Diaconis and Fill [9]. Further applications of evolving sets were described 
in [15, 16]. In all of these results, evolving sets were used as analytical tools rather than algorithms. 

In this paper, we design a local partitioning algorithm called EvoCut based on evolving sets. 
Our algorithm simulates the volume-biased evolving set process until a certain stopping time is 
reached, then outputs the resulting set. We prove that the algorithm has local approximation 
guarantee 0(^^/^ log^^^ n) and expected work/volume ratio O(0^^/^ polylog(n)). To prove the 
local approximation guarantee, we bound the rate of growth of the sets in the volume-biased ESP. 
In particular, we prove a lower bound that depends on the conductance of the sets observed by 
the process, and an upper bound that depends on the conductance of certain sets that contain the 
starting vertex. To bound the work/volume ratio, we combine a simple implementation trick with a 
nontrivial probabilistic analysis. We introduce an efficient method for simulating the volume-biased 
ESP that updates the vertices on the boundary of the current set and ignores the vertices in the 
interior. The work required to generate a sample path using this method is proportional to the 
cost of the sample path, which depends on the boundaries of the sets observed and the symmetric 
differences between successive sets. Using a martingale argument, we prove that the expected 
ratio between the cost of a sample path and the volume of the set output is 0((j!>~^/^ polylog(n)), 
which bounds the work/volume ratio of our algorithm. The main theorem about EvoCut, which 
gives a precise statement of its work/volume ratio and local approximation guarantee, is stated in 
Section 1.1. In Table 1, we compare EvoCut with existing local partitioning algorithms. 
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local partitioning algorithm 


work/volume ratio approximation guarantee 


Nibble (ST04) [20] 
Nibble (ST08) [22] 
PageRankNibble (ACL06) [1] 
EvoCut (this paper) 


0(0-5/3 polylog(n)) ^ 0(0^/3 log2/3 n) 
0{^-^ polylog(n)) ^ 0(0^/2 log^/^ n) 
0(0-1 polylog(n)) ^ O(0i/2 log^/^ „) 
O(0-V2 polylog(n)) ^ O(0V2 logV2 



Table 1: The work/volume ratio and approximation guarantee of known local partitioning algo- 
rithms. Here n = \V\ is the number of vertices in the graph. 



One application of our local partitioning algorithm is a fast algorithm for finding balanced 
cuts. Spielman and Teng showed how to find a balanced cut in nearly linear time by repeatedly 
removing small sets from a graph using local partitioning [20]. Applying their technique with 
our algorithm yields an algorithm EvoPartition with the following properties. The algorithm 
has complexity (m + n0-i/2) • 0(polylog(n)), and it outputs a set of vertices whose conductance 
is O(0i/2 logi/^ n) and whose volume at least half that of any set with conductance at most 0, 
where is an input to the algorithm. Our algorithm is faster by a factor of roughly 0^/2 than 
any existing algorithm that provides a nontrivial approximation guarantee for the balanced cut 
problem, but there are several algorithms that provide stronger approximation guarantees. The 
fastest previously known algorithms for finding balanced cuts are due to Arora-Kale [4] and Orecchia 
et al. [18]. These algorithms produce cuts with conductance O(01ogn), and their computational 
complexity is dominated by the cost of solving polylogarithmically many single-commodity flow 
problems, namely (m + min(n/0, n3/2)) • 0(polylog(n)). In Section 5, we give a more detailed 
description of EvoPartition and comparison with existing balanced cut algorithms. 

In the remainder of this section, we state the main theorem about our local partitioning algo- 
rithm EvoCut. In section 2, we review the basic properties of the ESP and volume-biased ESP. 
In section 3, we show how to find cuts with small conductance by generating sample paths from 
the volume-biased ESP. In section 4, we describe an algorithm for simulating the volume-biased 
ESP. We then construct EvoCut and prove the main theorem about its work/volume ratio and local 
approximation guarantee. In section 5, we describe the balanced cut algorithm EvoPartition. 

1.1 Main result 

Let G = {V,E) be a simple undirected graph with n = \V\ vertices and m = \E\ edges. The 
volume /x(S') of a set of vertices 5 C y is defined to be 

M^) :=^d(x), 
xes 

where d{x) denotes the degree of the vertex x. The number of edges between two sets of vertices S 
and R is written e{S, R). The complement of S is written S"^ = V\S, and we define d{S) = e{S, S"^) 
to be the number of edges leaving S. The conductance of a set of vertices S is defined to be 

0(5) := d{S)/^i{S). 

Notice that (piV) = 0. In other papers, the conductance of a set is sometimes defined to be 
d{S)/ m.irL{iJ,{S) , iJ,{S'^)) . When a set is output by one of our partitioning algorithms, we will upper 
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bound its volume by (3/4)/Li(l/), which ensures that the two definitions of conductance differ by 
only a constant factor. When the base is omitted, log means logg. 

Our main result is the analysis of the local partitioning algorithm EvoCut. The algorithm makes 
queries to an input graph G = {V, E). We assume the graph supports the following types of queries, 
which would be easy to support in practice by storing the graph in random access memory. Given 
an arbitrary vertex x, let N[x) be the set of vertices adjacent to a given vertex x. Wc assume we 
can obtain a list of the vertices in N{x) in time proportional to |A^(x)j, and obtain a node sampled 
uniformly from N{x) in constant time. The following is the main theorem, which describes the 
work/volume ratio and local approximation guarantee of EvoCut. 

Theorem 1. EvoCut(t;, 0) takes as input a starting vertex v G V and a target conductance (p G 
(0, 1), and outputs a set of vertices. For a given run of the algorithm, let S be the set of vertices it 
outputs, and let w be the amount of work it performs (the computational complexity). Both S and 
w depend on randomness used by the algorithm. 

1. Let w/ ijl{S) be the work/volume ratio. Then, 



2. If A C. V is a set of vertices that satisfies (p^A) < (p and ii{A) < (2/3)/i(y), then there is a 
subset A' C A with volume a,t least ii{A)/2 such that whenever v G A', with probability at 
least 1/2 the output set S satisfies all of the following: 



The description of EvoCut and the proof of Theorem 1 are given in Section 4. 

2 Preliminaries 

In this section we describe the ESP and volume-biased ESP, the connections between them, and 
their relationship to conductance and random walks. We use the terminology and basic results 
from [17]. The coupling described in section 2.5 is due to Diaconis-Fill [9]. The volume-biased ESP 
is equivalent to one of the strong stationary duals constructed in [9], which predates the ESP and 
volume-biased ESP. 



E[w/^i{S)] = o{4> 



/2 iog3/2|y|). 



(a) <p{S)=0{<P^lHog^l^\V\) 

(b) fi{S) < (3/4)^(y), 

(c) fi{SnA) > (9/10^(5). 



2.1 Random Walk 



A random walk on the graph G is a Markov chain defined by the transition kernel 



p{x,y) = { 1/2 
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Note that this is a "lazy" walk with holding probability 1/2. Given a set S, we let p{x,S) denote 
the probability of transitioning from x to some vertex in S, 

p{x,s) := Y^pi^^y) = \ {^-^ + e ^)) • 

Here, l(-) denotes the indicator function for an event. We write p*{x,y) for the t-step transition 
probabilities, and let Vx denote the probability measure for the Markov chain of a random walk 
started from x. 



2.2 The Evolving Set Process 

The evolving set process (ESP) is a Markov chain on subsets of the vertex set V. Given the 
current state S, the next state Si is chosen by the following rule: pick a threshold U uniformly at 
random from the interval [0, 1], and let 

Si = {y:p{y,S)>U}. (1) 

Notice that and V are absorbing states for the process. Given a starting state Sq C V, we 
write Ps'„(-) := P(- | Sq) to denote the probability measure for the ESP Markov chain started 
from Sq. Similarly, we write E5g(-) for the expectation. For a singleton set, we use the shorthand 
Pcc(-) = P{x}{-)- We define the transition kernel K{S,S') = Fs{Si = S'). 

2.3 Evolving sets and conductance 

The following propositions relate the conductance of a set in the ESP to the change in volume in the 
next step. The first proposition strengthens the fact that the sequence {n{St))t>o is a martingale. 

Proposition 1. Let U be the uniform random variable used to generate Si from S in the ESP. 
Then, 

^siKSi) \u<l) = fi{s) + d{s) = fi{s){i + m)- 
Es{fi{Si) \u>l) = ti{s) - d{s) = fi{s){i - m)- 



Proposition 2. The growth gauge ip{S) of a set S is defined by the following equation: 

l-V(S) := Esi 



lfi{S 



For any set S C.V, the growth gauge and conductance satisfy ip{S) > (p{S)'^/8. 

Proofs of Propositions 1 and 2 appear in [17], but the constants stated there differ from ours; 
their definition of conductance incorporates the holding probability from the random walk, which 
makes it smaller than ours by a factor of 2. 
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2.4 The Volume- Biased Evolving Set Process 



The volume-biased evolving set process (volume-biased ESP) is a Markov chain on subsets of 
V with the following transition kernel: 



KiS,S') = ^KiS,S'), 



(2) 



where K(S', S') is the transition kernel for the ESP. We remark that K is the Doob h-transform of 
K with respect to fi (see chapter 17 of [13]), and that the volume-biased ESP is equivalent to the 
ESP conditioned to absorb in the state V. Given a starting state 5o, we write P5o(-) := P(- | Sq) 
for the probability measure of the Markov chain. Similarly, we write E^^ (•) for the expectation. 

The following proposition relates the volume-biased ESP and the ESP. This is a standard 
consequence of the Doob /i-transform, but we include a proof for completeness. 



Proposition 3. For any function f and any starting set Sq 7^ 0, 

'lJ,{S, 



f{So, . . . ,Sn) 



(3) 



Proof. Assume that 5*0 7^ 0. Let C be the collection of sample paths {Sq, . . . , St) such that 
PsoiSi, . . . , 5t) > 0. If {So, ...,St)eC, then /x(5,) > for all j G [0, t], so 



t-i 



Pso{Si,...,St) = Yl 



j=0 



f^{Sj) 



'j+D 



/i(5, 



IJ-{So 



^Pso{Si, ■■■ ,Si 



t ■ 



Therefore, 



{So,-,St)eC 



iSo,...,St}eC 



,St)Pso{Si, ■■■,St) 



E5o 



f{So,...,St) 



□ 



2.5 The Diaconis-Fill Coupling 

Diaconis-Fill [9] introduced the following coupling between the random walk process and the 

volume-biased ESP. Let {Xt,St) be a Markov chain, where Xt is a vertex and 5^ C y is a subset 
of vertices. Let P* be the probability measure for the Markov chain. Given a starting vertex x, let 
Xo = X and Sq = {x}, and let P^(-) = P*(- | Xq = x,So = {x}). Given the current state {Xf, St), 
the transition probabilities are defined as follows. 

P*{Xt+i =y' \Xt = y,St = S)= p{y, y'), 

\^{S,S')l{y' eS') 



P*{St+i = S'\St = S,Xt+i = y') 



P{y' e St+i \St = S) 



5 



In words, we first select Xt^i according to the random walk transition kernel, then select St+i 
according to the ESP transition kernel restricted to sets that contain Xj+i. We define the transition 
kernel K*((y, S), {y' , S')) = P*(Xi = y', Si = S' \ Xo = y, Sq = S). 

The following proposition shows that P* is a coupling between the random walk process and 
the volume-biased ESP, and furthermore the distribution of Xt conditioned on {Sq, ...,St) is the 
stationary distribution restricted to St- A proof of Proposition 4 is given in chapter 17 of [13]. 

Proposition 4 (Diaconis and Fill). Let {Xt,St) be a Markov chain started from {x,{x}) with the 
transition kernel K* . 

1. The sequence (Xf) is a Markov chain started from x with the transition kernel p{-,-). 

2. The sequence (St) is a Markov chain started from {x} with transition kernel K. 

3. For any vertex y and time t > 0, 

PUXt = y\Si,...,Sn) = l{ye St)^y 
3 Local partitioning using the volume-biased evolving set process 

In this section, we show how to find sets with small conductance by generating sample paths from 
the volume-biased ESP. The following theorem shows that if we start from a single starting vertex 
and simulate the volume-biased ESP for T steps, then one of the states observed is likely to have 
conductance 0{^yT^^^logn). We can also prove that all the states observed are likely to have 
volume at most (3/4)/x(y), provided there exists a set ^ C y that has conductance at most T~^, 
and that the starting vertex belongs to a certain subset of A. 

Theorem 2. Fix an integer T, and let A CV be any set of vertices that satisfies IJ-{A) < (2/3) IJ,{V) 
and (p{A) < (100T)~^. Then, there exists a subset At C A of volume at least ^{A)/2 for which the 
following holds. If x E At, then with probability at least 7/9 a sample path {Si, . . . from the 
volume-biased ESP started from Sq = {x} will satisfy all of the following: 

1. (t){St) < SOt for some t £ [0,T], where 9t = ^/4:T-^logn{V). 

2. ij{Sj) < (3/4)/x(F) for all j G [0,r]. 

3. fi{Sj nA)> (9/10)/x(Sj ) for all j G [0,r]. 

The proof of Theorem 2 is at the end of this section, after we present two necessary Lemmas. 
Consider a sample path from the volume-biased ESP. The following lemma shows it is unlikely 

for the sample path to contain many sets with large conductance. Intuitively, this is true because 
at each step the quantity fJ,{St) tends to increase at a rate that depends on (^{St). Eventually the 
sample path will absorb in the state V, whose conductance is 4>{V) = 0. 

Lemma 1. For any starting set ^ ^ o-nd any stopping time t for the volume-biased ESP, 

E5o 



j=0 



<4Elog4|T<41ogM^). 
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Proof. Recall from Proposition 2 that, by definition, 1 — ip{S) := 'Es^/ n{Si)/ fi{S). Then, 



Si 



t-i 



St-i 



t-lj 



1 - i^jSt-i) 



We define 



Mt := Ft 



t-i 



where Ft =Yli^- i^{Sj))~\ and Fq := 1, 

j=0 



(4) 



We now verify that (Mt) is a martingale in the volume-biased ESP: 



E 



(^Mt\So,...,St-i) =FtE( 



St-i) 



t-l 



-t-l 



Mt- 



t-1- 



t-l) 



Let r be a stopping time for the volume-biased ESP. By the optional stopping theorem for non- 
negative martingales (see [23]), we have EM,- < Mq = 1. Then by Jensen's inequality, we have 
E log Mr < log(EMr) = 0. Taking the log of (4), 



logi^V = logM^ + ^log4|4■ 
2 Ml'^oj 



(5) 



Since (1 - ^(-5^))-^ > e^(^^). 



T-l 



T-l 



logF, = log n(i - i^iSj))-' >T.m] 

j=0 j=0 



(6) 



Taking expectations in (5) and (6) yields 



E J2 ^(Sj) < E log i=; = E log Mr + -E log 



j=0 



IJ.{So) 2 /x(S'o) 



We apply the inequality (p{Sj)'^ < 8jp{Sj) from Proposition 2 to finish the proof, 



E^^mf < 8E^V(5,) < 4Elog4^ < 41ogMF). 



T-l 



□ 
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Corollary 1. Let 9t = log/x(y). For any starting set Sq, integer T, and constant c > 0, 

> 1 - 1/c. 



So 



Proof. Fix Sq and T, and consider a sample path (Sq, . . . ,St) from the volume-biased ESP. Let 
(pj := (piSj). Lemma 1 implies that E^^ J^jKT^] — 41og/x(y). By Markov's inequality, the 
event J^jKr^] — 4clog/i(y) holds with probability at least 1 — 1/c. If that event holds, then 
minj<T(^j) < ^/c^T■ □ 

Now that we know a sample path (5o, . . . , St) from the volume-biased ESP is likely to contain 
a set with small conductance, we are halfway done with the proof of Theorem 2. We still need to 
show that the sets observed in the volume-biased ESP are likely to have volume at most (3/4)/i(y). 
We start with a standard fact (Proposition 5) that bounds the probability that a lazy random walk 
escapes from a given set A C. V. We then prove Lemma 2, which converts this standard fact into 
a statement about the volume-biased ESP. By combining these results we obtain a bound on the 
fraction of Sj that is not contained in A. This yields a bound on the total volume of Sj. 

Proposition 5. Let (Xj) be a lazy random walk Markov chain starting from the vertex x. For any 
set AC.V and integer T, let 

esc{x,T,A) := V, [uj^oi^j A)] , 

which is the probability that a lazy random walk starting from x leaves A within the first T steps, 
and define AT:={xeA \ esc{x,T,A) < r0(A)}. Then, i2{At) > {l/2)n{A). 

A proof of Proposition 5 appears in [22]. In Theorem 2.5 of that paper, it is shown that 
IJ,{A)-^ ^^^^ lj,{x) esc{x,T, A) < T4>{A)/2. This implies //(^r) ^ m(^)/2, by Markov's inequality. 
The statement of Theorem 2.5 is slightly weaker due to a minor difference in the definition of (j), 
but their proof establishes the stronger statement. 

Lemma 2. For any vertex x, set AQV, and integer T > 0, the following holds for all A > 0, 

1 



^^t^i^l\^>Xesc{x,T,A) 

t<T l^{St) 



Proof. Recall from section 2.5 the coupling P* between the volume-biased ESP Markov chain (St) 
and the random walk Markov chain (Xt). This coupling has the property that for any t > 0, 

P*[Xt = y\So,...,St] = ^^liyeSt). 

Fix a value 7 G [0, 1] and let r be the first time t when fJ,{St \ ^) > ^fi{St), or let r = 00 if this 
does not occur. Consider the probability that Xt A, conditioned on Sr'- 

P*[X^^A\S. = S]= 4^ = ^^- 
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By the definition of r, we have P* [Xt ^ ^ | r < T] > 7, so 

esc(x,r,^) =P* [J%^{X, <^ A)] 

> P* [Xr A] 

>P*[Xr ^ A\t <T]P*[t < T] 
>jP*[t<T]. 



Therefore 

KSt \ A) 



max > 7 

t<T n{St) 



7 

The lemma follows by taking 7 = A esc(x, T,A). □ 

We now combine the results from this section to prove Theorem 2. 

Proof of Theorem 2. Let ^ C y be a set and T be an integer that satisfy 0(^) < (100T)~^. Let 
At C yl be the set defined in Proposition 5, and assume that x G At- Let (^o, . . . , St) be a sample 
path from the volume-biased ESP started from = {x}. 

By Corollary 1, with probability at least 1 — 1/9 there exists some t < T for which 0(<S't) < 39t- 
The definition of At implies that esc(a;,r, A) < T(j){A) < 1/100. Lemma 2 then shows that with 
probability at least 9/10, 

^^^Is.^^ ^ 10 esc(x, T,A)<^ for all j G [0, T] . 

Since n{A) < {2/3)n{V), we have, for all j e [0,T], 

^^^'^ - tifsfnAf ^^^ - (io/9)(2/3)m(^) < mMV). 

By the union bound, with probability at least 7/9 the sample path (S'o, . . . , St) satisfies all the 
conclusions of the theorem. □ 



4 Simulating the volume-biased evolving set process 

In the beginning of this section, we describe a subroutine GenerateSample that simulates the 
volume-biased ESP until a certain stopping time r is reached, generating a sample path (5*0, . . . ,St) 
and producing as output the set Sr- We choose r to be the first time that St has sufficiently small 
conductance, or that the work performed exceeds a specified limit. We assign a cost to each sample 
path that depends on the boundaries of the sets in the path, and the difference in volume between 
successive sets. We then show that the work performed by GenerateSample is 0(polylog(n)) times 
the cost of the sample path it generates. 

The algorithm EvoCut, which we construct at the end of this section, outputs the set Sr com- 
puted by GenerateSample. To bound the work/volume ratio of EvoCut, we directly bound the 
expected ratio between the cost of {Sq, . . . , St) and the volume of St- 
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Definition 1. The cost of a sample path {Sq, . . . ,St) is 

t 

cost(So, ...,St):= ^i{So) + (KSjASj^i) + , (7) 

i=i 

where A denotes the symmetric difference between two sets. 

Definition 2. Given integers T and B, let t{T, B) be the first time one of the following occurs: 

1. 4>{St) < 0T. 

2. t = T or cost(5o, ...,St)>B. 

The following theorem shows that GenerateSample generates a sample path from the volume- 
biased ESP with stopping rule t{T, B), and that its complexity is at most O(logn) times the cost 
of the sample path it generates. The complexity is also bounded by 0{B\ogn). 

Tlieorem 3. The algorithm GenerateSample(a;, T, B) takes as input a vertex x, an integer T > 
and an integer B > 0. Let Sq = {x} and let r = t{T,B). The algorithm generates a sample path 
{Sq, . . . , Sr) and outputs the last set Sr- The following hold. 

1. The probability that GenerateSample generates the sample path {Sq, . . . , Sr) is Px[So, ■ ■ ■ , St]- 

2. //GenerateSample generates {Sq, . . . ,Sr), then its output is Sr and its complexity is 

O(logn) min(5,cost(S'o, . . . , S-r))- 

The description of GenerateSample and the proof of Theorem 3 are given in Section 4.1. At 
a high level, GenerateSample simulates the volume-biased ESP by updating the boundary of the 
current set at each step. We define 6{S) to be the two-sided vertex boundary of S, 

6iS) = {y:yeSA e(y, S') > 0} U {y : y e S'' A e(y, S) > 0}. 

The algorithm maintains a dynamic data structure that stores the current state S, its two-sided 
boundary S{S), and the values of p{y,S) for vertices in the two-sided boundary. This allows the 
algorithm to ignore the vertices in the interior of the set when selecting the next state. The 
complexity of GenerateSample is dominated by the work required to iterate over the boundary of 
the current set, select the next state using the stored values of p{y,S), and update the set-with- 
boundary data structure. 

In the following theorem, we bound the expected ratio between the cost of the sample path 
(S'o, . . . , Sr) and the volume of Sr, which bounds the work/volume ratio for EvoCut. 

Tiieorem 4. For any starting set Sq and any stopping time r that is bounded by T, we have 

COSt(<S'o, . . . , Sr) 



E, 



So 



f,{Sr) 



< 1 + VriogM(y). 



The proof of Theorem 4 is the technical highlight of our analysis. The proof uses a martingale 
argument and the transform between the ESP and volume-biased ESP. It bounds the work/volume 
ratio in a more direct way than previous local partitioning algorithms, which required the user to 
guess the approximate volume of the output set [20, 22, 1]. 
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Proof. Let Cj be the cost of the step in which Sj is selected, 

c, ■.= ^l{s,AS,.l) + ^iSj.l). 

We define cq := /^(Sq), and recall that cost(5'o, . . . , St) = cq + . . . + Q. 
Consider the conditional expectation of cj in the ESP. We have 



E c 



We now compute the expected volume of the symmetric difference. Let U be the uniform random 
threshold used to select Sj from Sj-i in the ESP, and recall that Sj C Sj-i when U > 1/2, and 
Sj~i C Sj when ?7 < 1/2. By Proposition 1, 



E 



(M5, AVi)| Vi) 



e(|/x(5,)-MVi)||5,_i) 
^e(M5,)-m(5,_i)|5,_i,C/,<^) 



d{Sj-i). 



Let i?t := ^°^^^^°Sf) '^*^ ' consider the conditional expectation of Rt in the volume-biased ESP. 



•^o, ■ ■ ■ , St-i 



Therefore, B{cj \ Sj-i) = 2d{Sj-i 

Let Rt := 
By Proposition 3, 

E(i?t I ^o, . . . , St-i) = E 



cost(6'o,...,5't) fijSt) 
n{St) n{St-i) 

j^-y (^cost(6'o, . . . , St-l) + E(ct I ^0, . . . , 5't_l)j 



1 



M('S't-i 
i?t_i + 20t_i. 



(cost(So,...,5t-i) + 2a(5t-i)) 



We define 



Mt := Rt - Qt, where Q* := 1 + 2 ^ 

i=i 

By construction, (Mt) is a martingale in the volume-biased ESP. Notice that Afg = Rq — I = 0. 
Now let T be an arbitrary stopping time that is bounded by T. By the optional stopping theorem 
for martingales (sec [23]), we have E[M^] = Mq = 0, so E[i?^] = E[(5t]- By Cauchy-Schwarz, 



T-l 
j=0 



T-l 
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By Jensen's inequality, 



T-1 



B[Rr] = B[Qr] < E[Qt] = l + 2j2^j 



T-1 



< 1 + 2VtB ^</)2 



j=0 



T-1 



<i + 2Vt e Y.4>j 



<l + 4^Tlogfi{V). 



In the last step, we used Lemma 1. 



□ 



We can now state the local partitioning algorithm EvoCut and prove the main theorem. 
EvoCut(w, (p): 

1. Let T = L0-V1OOJ. If T = 0, then output {v}. 

2. Let S = GenerateSample('U,T, oo), and output S. 

Proof of Theorem 1. If T = 0, then we output the set {v}, which trivially satisfies the theorem. 
Assume that T > 1. Run GenerateSample(u, T, oo), let w be the work (the complexity of the 
algorithm on this particular run), and let (5o, . . . , Sr) be the sample path generated. Since 0(A) < 
4) < I/IOOT, The orem 2 sho ws that with probability at least 7/9, Sr satisfies ii{Sr) < (3/4)ju(F) 
and (t){Sr) = 0{^JT-^\ogm) = O(V^logn). 

The work w is dominated by the complexity of GenerateSample, which by Theorem 3 is 
O(logn) • cost(5'o, . . . , Sr). By Theorem 4, 



Wc remark that the improvement in running time of EvoCut comes from ignoring the interior of 
the current set when simulating the volume-biased ESP. This type of optimization seems unique to 
evolving sets, and is not possible with random walks or personalized PageRank. We remark that, if 
we simulated the volume-biased ESP with a naive method that requires work roughly proportional 
to the sum of the volumes of the sets, the resulting work/volume ratio would be 0(T log m) rather 
than 0{^T log m). This would match the previous fastest local partitioning algorithm from [1]. 

4.1 GenerateSample 

In this section we describe GenerateSample and prove Theorem 3. The following proposition 
describes the set-with-boundary data structure that is used by GenerateSample. 

Proposition 6. There is a set-with-a-boundary data structure S that supports these operations: 



cost(>So, . . . , Sr) 

J^Sr) 



0{y/T\ogn) = 0{^/(p-^logn), 



so E[w/fi{Sr)] = 0(^(^-1 log n) • O(logn). 



□ 
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• add or remove a vertex y from S in time 0{d{y) log ii{S)). 

• get the value of e{y,S), l{y £ S), orp{y,S) in time 0{log fi{S)). 

• iterate over the vertices in the boundary d{S) in time 0{\d{S)\). 

Proof. The set-with-boundary data structure can be implemented using two standard dictionary 
data structures. We maintain a membership dictionary that contains the vertices in the set 
S, and a boundary dictionary B that contains the vertices in 6(S) and stores the associated value 
B{y) = e{y,S) for each y G S{S). These dictionaries must support the following operations: 
inserting and deleting a key and its value, checking whether a given key is in the dictionary, and 
looking up the value associated with a key. A red/black tree supports these operations in 0(log A^) 
worst case time per operation, where N is the number of keys currently in the tree (see [8]). 

The value of l(y G S) can be computed by checking whether y e M. For any vertex y e V, 
the value of e{y, S) can be computed using two lookups, one into the membership dictionary and 
one into the boundary dictionary: 



It is straightforward to compute p{y, S) from e{y, S) and l{y G S). 

Each time a node is added or removed from S, we update the membership dictionary. For 
each neighbor z ^ y we increment or decrement the value of e{z, S) in the boundary dictionary. 
For each node z that was updated (including y and its neighbors), we determine whether the 
node is contained in 6{S) by examining the values of e{z,S) and {z G S), then add or remove 
z from the boundary dictionary when necessary. In total, adding or removing y takes 0{d{y)) 
dictionary operations. The size of either dictionary is 0(/i(5')), so each dictionary operation takes 



We now describe GenerateSample. The input is a starting vertex x, a time limit T > 0, and 

a budget B > 0. The output is a set Sr sampled from the volume-biased ESP with the stopping 
rule r = t(T,B). The algorithm simulates the volume-biased ESP Tising the coupling described 
in Proposition 4. It uses an instance S of the set-with-boundary data structure to maintain the 
current state St, and also stores a vertex X that represents the current walk position Xf. Initially, 
S = So = {x} and X = Xq = x. The algorithm proceeds in steps. At the beginning of step t, we 
have S = St-i and X = Xf^i. The algorithm continues until the stopping time r is reached, then 

outputs Sr- 

Each step has two stages. In the first stage we select St and compute a list of the vertices 
that need to be added or removed from Sf-i to form Sf. This stage requires 0(1) -|- 0{d{St-i)) 
operations. We stop after the first stage if cost (5*0, Sf) > B. Otherwise, we proceed to the 
second stage in which we update S to St, which requires 0(1) + 0{iJ,{StASt-i)) operations. Each 
operation is either a constant time operation or a dictionary operation requiring time O(logn). 

In stage 1, we begin with X = Xt-i, then update the walk particle. Given that Xt-i = xt-i, 
we choose Xt = xt with probability p{xt-i,xt), and update X = X-t. We assume that a random 
neighbor of Xf-i can be selected in constant time. We compute p{xt,S) by a lookup into the 
set-with-boundary data structure, and select a random threshold Z uniformly from the interval 



e{y,S) 



< d{y) 
.0 



if yen, 

ii y ^ B and y G M, 
ii y ^ B and y ^ M. 



(8) 



time 0(log/x(S')). 



□ 
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[0,p{xt, S)]. At this point we define St = {y \ p{y,St-i) > Z}, but we do not yet update S to 
reflect St- Instead we create a list D of the vertices in the set difference SfASt-i- We populate 
the list by iterating over each node y G S{S), looking up the value oi p{y,S), and comparing this 
value with the threshold Z. While doing this, we update the values of i^{St) and cost(S'o, . . . , St). 
We then check whether either of the stopping conditions t = T or cost(S'o, . . . ,St) > B \s satisfied. 
If so, then r = t, so we stop and output St = St-iAD. Otherwise, we proceed to the next stage. 

In stage 2, we update S to St by adding or removing the vertices from D. While making these 
updates to S, we also update d{St-i) to d{St). We compute (piSt) = d{St) / iJ,{St) and check whether 
(t){St) < 9t- If so, then we halt and output the set St- Otherwise, we proceed to the next step. 

GenerateSainple(a;, T, B): 
Input and output: 

The input is a starting vertex x, and two integers T > and B >0. 

The output is a set Sr sampled from the volume-biased ESP with stopping rule t = t{T,B). 
Internal state: 

S* = an instance of the sct-with-boundary data structure. 
X — the current location of the random walk particle. 

The algorithm also maintains the current values of d{S), ii{S), and cost(S'o, . . . ,S). 
Algorithm: 

Initially, let S = Sq = {x} and X = xq = x. 

At the beginning of step t, we have S = St-i and X = Xt-i. 

For t = 1 . . .T, do step t as follows: 

1. Stage 1. Select the vertices to add or remove from S: 

(a) Given Xt_i = Xt-i, select Xt = Xt with probability p{xt-i,Xt) and update X <— xt- 

(b) Lookup p{xt, St-i) and pick Z uniformly at random from the interval [0,p{xt, St-i)]. 

(c) Defines, = {?; I p(y,5(_i)>Z}. 

(d) Compute a list D of the vertices in StASt-i as follows. 
For each y e 6{St-i): 

i. lookup 

ii. if y e StASt-i, then add y to D. 

(e) Update fi{St) and cost(S'o, . . . , Si). 

(f) Ut = T ov cost(S'o, ...,St)>B, then t = r, so halt and output St = St-iAD. 
Otherwise, proceed to the next stage. 

2. Stage 2. Update S: 

(a) Update S to St ^ St-iAD by adding or removing the vertices in D from S. 

(b) Update d{St) and compute (j){St) = d{St)/niSt). 

(c) If (piSt) < 0T, then t = t, so halt and output St- 
Otherwise, proceed to the next step. 

Proof of Theorem 3. By construction, GenerateSample simulates the coupling from Section 2.5. 
By Proposition 4, the sequence (^o, . . . ,Sr) it generates is a sample path from the volume-biased 
ESP. 

Let ct = 0{d{St-i) + fi{St-iASt))- We will show that the number of operations performed 
in step t is 0{ct). Each operation is either a constant time operation or a dictionary operation 
requiring time O(logn). The number of operations performed in stage 1 is dominated by part (d) 
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in the pseudocode, in which lookup operations are performed for each vertex in 0{S{St-i)). This 
requires 0(1) + 0{d{St)) operations. The number of operations performed in stage 2 is dominated 
by part (a), in which the vertices from S^-iASt are added or removed from the set-with-boundary 
data structure S. By Proposition 6, this requires 0(1) + 0{iJ,{StASt-i)) operations. In total, the 
number of operations required in step t is 



The 0(1) term above can be ignored safely because d{St-i) > for t < r. We define cq = d{xo) = 
/x(S'o) to account for the work required to create Sq. Then, the total number of operations performed 
by GenerateSample is O(co+, . . . , +Cr) = 0(cost(S'o, . . . , Sr))- 

If cost(S'o, . . . , Sr) > B, then GenerateSample halts after stage 1 during step r, and the number 
of operations performed in step r is 0((?(S't-_i)). The total number of operations performed is 
therefore 



5 Finding balanced cuts 

Spiclman and Teng used their local partitioning algorithm Nibble to construct an algorithm 
Partition that finds a cut with small conductance and approximately maximal volume [22]. To do 
this, they constructed a subroutine called RandomNibble that applies their local partitioning algo- 
rithm from a random starting vertex with a random budget. The time complexity of RandomNibble 
is nearly independent of the size of the input graph, and the set that it output contains, in ex- 
pectation, a small fraction of any set in the graph that has sufficiently small conductance. In this 
section, we construct an analogous subroutine called EvoNibble by making small modifications to 
EvoCut. We then describe the algorithm EvoPartition that results from substituting EvoNibble 
for RandomNibble in Spielman and Teng's construction of Partition. 

5.1 EvoNibble 

In this section we describe the subroutine EvoNibble. 
EvoNibble(^): 

1. Let T = [(^"VlOOj, and let 9t = log/x(y). 

2. Choose a random vertex X eV with probability P{X = x) = d{x)/ jxiy). 

3. Choose a random budget as follows. Let Jmax = riog2 > ^'^d let J be an integer from 
[0, Jmax] chosen with probability P{J = j) = a2~^ , where cr is a proportionality constant. 
Let Bj = 872-^, where -y = 1 + Ay/T\og fi{V). 

4. Compute S = GenerateSample(X, T, Sj). 

5. If (piS) < Wt and ^i{S) < (3/4)^(y), then output S. Otherwise output 0. 

Theorem 5. The randomized algorithm EvoNibble((^) takes as input (p € (0, 1) and outputs a set 
S CV. The following hold: 



0{d{St-i) + fi{St-iASt)) + 0(1) = 0(ct). 



O(cost(5o 

because d{Sr-i) < /x(5t— 1) < cost(5'o: 



Sr-l)) + d{Sr-l)=0{2B), 

Sr-i) < B. The theorem follows. 



□ 
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1. The expected complexity is 0{(f> ^^'^ log^^^ fi{V)). 



2. Either S = $, or S satisfies ^{S) = O(^01og/i(F)) and fi{S) < {3/A)n{V). 

3. For any set AC V that satisfies fi{A) < {2/3)fi{V) and (p{A) < (f), 

^.M5nA)X 1 



lx(A) j - 20/i.(V)' 

Proof. First we prove conclusion (1). Let W be the complexity of the algorithm. By Theorem 3, we 
have {W \ J) = 0{Bjlog fi{V)) = 0{j2-^ log n{V)), where 7 = 0{^y^-^ log ii{V)). The expected 
complexity is 

EiW)= Yl E{W\J = j)P{J = 3) 

= o{^2nog^l{v))o{2-i) 

= O {-^ log ^Ji{V) J max) 

= o{4>-"^iog,ji{vf'^). 

Conclusion (2) is immediate from the definition of the algorithm. We now prove conclusion (3). 
Let Sout be the output of EvoNibble. Let X be the starting vertex. Let ^ be a set that satisfies 
the requirements of conclusion (3), and let At C A be the subset described in Proposition 5. We 
will prove the following: 

if X G ^T, then E{iJ,{Sout n A) \ X = x) > 1/10. (9) 
After that, conclusion (3) follows by taking the expectation over the choice of the starting vertex: 

E{fi{Sout nA)) = Y E{ii{Sout r^A\X = x)P{X = x)> (1/10)P(X G At) > {l/2d)ii{A)/ ^{V). 
xev 

We now prove (9). Consider a sample path from the volume-biased ESP started from {X}, and let 
r = r(r, 00). Let D be the event that all of the following hold: 

1. COSt(S'o, ■■■,Sr)< i'JuiSr), 

2. (l>{Sr) < SOt, 

3. fi{Sr) < (3/4)/x(F), 

4. ljL{SrnA) > {9/10)^i{Sr). 

Combining Theorem 4 and Theorem 2 shows that if x G At, then P(D | X = x) > 1/4. The 
subroutine GenerateSaiiiple(X, T, S) returns the set <S'^(r,B) rather than Sr- To deal with this, we 
define the events Dj = D A {n{Sr) G [2^,2^+'^)). Note that 

(cost(5'o, ...,Sr)\ Dj) < 872^' = Bj. 
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This implies that if the event {Dj A (J > j)) holds, then Sout = 'S'r(r,B) = S^, and furthermore 
IJ.{Sout n A) > (9/10)2^'. For any a; G Ay, we have 

E{tx{Sout nA)\X = x)> ^if^i^out nA)\X = x,J = j, Dj)P{Dj \ X = x)P{J = j) 

je[0,Jmax] 

> (9/10)a Yl P^^o \X = x) 

> (9/10)aP(£» \ X = x) 

> 1/10. 

This establishes (9) and completes the proof. □ 
5.2 EvoPartition 

The algorithm EvoPartition described in the following theorem can be constructed by substituting 
the subroutine EvoNibble for RandomNibble in Spielman and Teng's algorithm Partition. We 
omit the proof of Theorem 6 and the description of the algorithm, and refer the reader to Theorem 
3.2 in [22]. At a high level, the algorithm applies the nibbling subroutine, removes the resulting 
cut from the graph, then repeats. It stops after 0(m polylog(ri)) steps or when a large fraction of 
the graph has been removed. 

Theorem 6. The randomized algorithm EvoPartition((/)) takes an input (j) G (0, 1), and it outputs 
a set 5 C y. The expected complexity is 0(m^~^/^ polylog(n)). With probability at least 1/2, both 
of the following hold: 

1. <p{S) = 0{y/^logm) and ^(5) < (7/8)/x(F). 

2. At least one of the following holds: 

(a) > {1/4MV) 

(b) For any set AC V that satisfies (f){A) < <j) and fj,{A) < (2/3)/x(y), we have fj,{S D A) > 
l,{A)/2. 

Wc remark that the complexity of EvoPartition can be reduced from 0(m(/)~^/^ polylog(n)) 
to (m + n(p^^^'^) ■ 0(polylog(n)) by applying the sparsification technique of Bencziir-Kargcr [6]. 

In Table 2, we summarize the complexity and approximation guarantee of selected algorithms 
for the balanced cut problem. For all the algorithms listed, we first apply the Benczur-Karger [6] 
sparsification technique to the graph. We state the running times in terms of (f), which for the first 
four algorithms is specified as part of the input (see Theorem 6). The next three algorithms solve 
a different formulation of the balanced cut problem, where the volume of the set A C F is specified 
rather than the conductance. For the purpose of comparison, we translate their approximation 
guarantees to our formulation. See the original papers for the precise statements. 

Spielman and Teng's algorithm Partition [20, 22] was the first balanced cut algorithm with 
a nearly-linear complexity. The algorithm of Khandekar-Rao-Vazirani [11] finds a balanced cut 
by solving single commodity flow problems, and outputs a cut of conductance 0((/)log^n) in time 
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balanced cut algorithm 


complexity 






approximation 




Partition (ST04) [20] 
Partition (ST08) [22] 
PageRankPartition (ACL06) [1] 
EvoPartition (this paper) 


(m + n(j)~^^^) ■ 0(polylog(n)) 
(m + n(t)~'^) ■ 0(polylog(n)) 
(m + n0~^) • 0(polylog(n)) 
(m + n4)~^^'^) ■ 0(polylog(n)) 




(/> ^ O(0i/3 log^/^n) 

<P O(0^/2 lQg3/2 

(P^O{(p^/^ logi/2 ^) 

<^^0(^l/2logl/2 ^) 


Arora-Hazan-Kale [3] 

Arora-Kale [4] 
Orecchia et al. [18] 


0{n^ polylog(n)) 

(m + minjn^/^, n(/)~^}) 
(m + min{n"3/^, n0^^}) 


0(polylo^ 
0(polyloj 


;(n)) 


(/)^ 0((^log^/2 

(/> ^ 0((/)logn^ 
0(0 log 


n) 


Recursive spectral (power method) 
Recursive spectral (Lanczos) 
Recursive spectral (ST solver) [21] 


0(n^A~^ polylog(n)) 
0(n2A-V2 polylog(n)) 
0(n^ polylog(n)) 






<^ ^ 0(01/^) 
(f) ^ O(0V2) 

<^ ^ O(0V2) 





Table 2: Comparison of selected algorithms for finding balanced cuts. Here n = \V\, m = \E\, and 
(p is an input to the algorithm. 

0{Tfiow polylog(n)), where Tfiow = (m + min(n3/^, n(f)~^)) ■ 0(polylog(n)). The algorithm of Arora 
and Kale [4] outputs a cut of conductance O(01ogn) in time 0(rj;ou, poly log (n)). The algorithm 
of Orecchia et al. [18] obtains the same running time and approximation within the cut-matching 
framework of [11]. The best approximation is O(01og^/^n), due to Arora-Rao-Vazirani [5]. The 
fastest algorithm that attains this approximation is due to Arora-Hazan-Kale [3]. 

Spectral partitioning methods can be applied recursively to find balanced cuts (see [19]). As far 
as we know, there is no good way to lower bound the balance of the cut output by the basic spectral 
partitioning method. As a result, the best known bound on the depth of the recursion is r2(n), and 
the best known time bound for finding balanced cuts using recursive spectral partitioning is r2(n^). 
Let (p^ = min{0(74) : A C.V, Ijl{A) < n{V)/2}. Let A be the smallest nonzero eigenvalue of the 
normalized Laplacian, which satisfies A < 20*. The basic spectral partitioning method produces 
a set of conductance 0(\/A) = O(\/0^) (see [14, 7]). If the power method is used to compute 
an approximate eigenvector, then the complexity of finding an unbalanced cut using the spectral 
method is 0(nA~^ polylog(n)). If the Lanczos algorithm is used instead, then the complexity 
improves to ©(nA"^/^ polylog(n)). The complexity can be further improved to 0(n polylog(n)) by 
using the linear system solver of Spielman-Teng to compute the pseudo-inverse, then applying the 
inverse power method (see [21]). 
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